Relevance Weighting Using Distance Between Term Occurrences
نویسندگان
چکیده
Recent work has achieved promising retrieval performance using distance between term occurrences as a primary estimator of document relevance. A major bene t of this approach is that relevance scoring does not rely on collection frequency statistics. A theoretical framework for lexical spans is now proposed which encompasses these approaches and suggests a number of important directions for future experimental work. Based on the formalism, approaches to issues such as scoring partial spans, treatment of repeated term occurrences within spans, and the importance of ordering are proposed. Consideration is given to the practical application of the formalism to both locating and scoring concept intersections and to locating phrases (with an estimate of con dence) despite intervening or substituted words.
منابع مشابه
Heading-Aware Proximity Measure and Its Applica- tion to Web Search
Proximity of query keyword occurrences is one important evidence which is useful for effective querybiased document scoring. If a query keyword occurs close to another in a document, it suggests high relevance of the document to the query. The simplest way to measure proximity between keyword occurrences is to use distance between them, i.e., difference of their positions. However, most web pag...
متن کاملTerm frequency with average term occurrences for textual information retrieval
In the context of Information Retrieval (IR) from text documents, the term-weighting scheme (TWS) is a key component of the matching mechanism when using the vector space model (VSM). In this paper we propose a new TWS that is based on computing the average term occurrences of terms in documents and it also uses a discriminative approach based on the document centroid vector to remove less sign...
متن کاملTerm Distillation for Cross-DB Retrieval
In cross-DB retrieval, the domain of queries differs from the retrieval target in the distribution of that of term occurrences. This causes incorrect term weighting in the retrieval system which assigns to each term a retrieval weight based on the distribution of term occurrences. To resolve the problem, we propose \term distillation" which is a framework for query term selection in crossDB ret...
متن کاملTerm Distillation In Patent Retrieval
In cross-database retrieval, the domain of queries di ers from that of the retrieval target in the distribution of term occurrences. This causes incorrect term weighting in the retrieval system which assigns to each term a retrieval weight based on the distribution of term occurrences. To resolve the problem, we propose \term distillation", a framework for query term selection in cross-database...
متن کاملRelation Based Term Weighting Regularization
Traditional retrieval models compute term weights based on only the information related to individual terms such as TF and IDF. However, query terms are related. Intuitively, these relations could provide useful information about the importance of a term in the context of other query terms. For example, query “perl tutorial” specifies that a user look for information relevant to both perl and t...
متن کامل